The end goal of this project is to provide insights through data analysis to assist all stakeholders (Lily Moreno, Cyclistic marketing team and Cyclistic executive team) to design marketing strategies aimed at converting casual riders to annual members.
To do that,three questions need to be addresses: 1. How do annual members and casual riders use Cyclistic bikes differently? 2. Why would casual riders buy Cyclistic annual memberships? 3. How can Cyclistic use digital media to influence casual riders to become members?
This project is conducted to answer the first question only and provide a report with the following deliverable: 1. A clear statement of the business task 2 . A description of all data sources used 3. Documentations of any cleaning and manipulation of data 4. A summary of the analysis 5. Supporting visualizations and key findings 6. Top three 3 recommendations based on analysis
For this project, we will use the previous 12 months (from 2021-04 to 2022-03) of Cyclistic’s historical trip data, provided by Motivate International Inc, under this license) to analyze and identify trends.
library(readr)
X202104_trip_data <- read_csv("202104_trip_data.csv")
## Rows: 337230 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202104_trip_data)
library(readr)
X202105_trip_data <- read_csv("202105_trip_data.csv")
## Rows: 531633 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202105_trip_data)
library(readr)
X202106_trip_data <- read_csv("202106_trip_data.csv")
## Rows: 729595 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202106_trip_data)
library(readr)
X202107_trip_data <- read_csv("202107_trip_data.csv")
## Rows: 822410 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202107_trip_data)
library(readr)
X202108_trip_data <- read_csv("202108_trip_data.csv")
## Rows: 804352 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202108_trip_data)
library(readr)
X202109_trip_data <- read_csv("202109_trip_data.csv")
## Rows: 756147 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202109_trip_data)
library(readr)
X202110_trip_data <- read_csv("202110_trip_data.csv")
## Rows: 631226 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202110_trip_data)
library(readr)
X202111_trip_data <- read_csv("202111_trip_data.csv")
## Rows: 359978 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202111_trip_data)
library(readr)
X202112_trip_data <- read_csv("202112_trip_data.csv")
## Rows: 247540 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202112_trip_data)
library(readr)
X202201_trip_data <- read_csv("202201_trip_data.csv")
## Rows: 103770 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202201_trip_data)
library(readr)
X202202_trip_data <- read_csv("202202_trip_data.csv")
## Rows: 115609 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202201_trip_data)
library(readr)
X202203_trip_data <- read_csv("202203_trip_data.csv")
## Rows: 284042 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(X202203_trip_data)
Findings: All 12 data frames have the same structure
trip_data_combined<-rbind(X202104_trip_data,X202105_trip_data,X202106_trip_data,X202107_trip_data,X202108_trip_data,X202109_trip_data,X202110_trip_data,X202111_trip_data,X202112_trip_data,X202201_trip_data,X202202_trip_data,X202203_trip_data)
Findings: Notice lots of “NA”
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ dplyr 1.0.9
## ✔ tibble 3.1.6 ✔ stringr 1.4.0
## ✔ tidyr 1.2.0 ✔ forcats 0.5.1
## ✔ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
trip_data_nadrop<-trip_data_combined %>% drop_na()
x<-trip_data_nadrop
x$ride_length<-difftime(trip_data_nadrop$ended_at,trip_data_nadrop$started_at)
trip_data_nadrop_ridelength<-x
outliers<-filter(trip_data_nadrop_ridelength,ride_length<=0)
Findings: There are 207 rides with “ride_length”<=0
trip_data_nadrop_ridelength_nooutliers<-filter(trip_data_nadrop_ridelength,ride_length>0)
x<-trip_data_nadrop_ridelength_nooutliers
x$weekday<-weekdays(trip_data_nadrop_ridelength_nooutliers$started_at)
x$month<-months(trip_data_nadrop_ridelength_nooutliers$started_at)
x$quarter<-quarters(trip_data_nadrop_ridelength_nooutliers$started_at)
x$hour<-format(trip_data_nadrop_ridelength_nooutliers$started_at,"%H")
trip_data_clean<-x
remove(x)
ggplot(data = trip_data_clean)+geom_bar(mapping = aes(x=member_casual,fill=member_casual))+labs(title=("Which user type bikes more?"),subtitle ="Casual vs Member from 2021-04 to 2022-03", x="User type", y="Number of rides",caption = "1e+06=1,000,000, Data provided by Motivate International Inc.")+scale_fill_discrete(name="User type")
trip_data_clean %>% group_by(member_casual) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100,total_ride_length=sum(ride_length),average_ride_length=sum(total_ride_length/count))
## # A tibble: 2 × 5
## member_casual count `%` total_ride_length average_ride_length
## <chr> <int> <dbl> <drtn> <drtn>
## 1 casual 2044256 44.0 3931016968 secs 1922.9573 secs
## 2 member 2596932 56.0 2016519224 secs 776.5006 secs
Findings:
ggplot(data = trip_data_clean)+geom_bar(mapping = aes(x=member_casual,fill=member_casual))+labs(title=("Which bike type do riders like the most?"),subtitle ="Casual vs Member from 2021-04 to 2022-03", x="User type", y="Number of rides",caption = "Data provided by Motivate International Inc.")+scale_fill_discrete(name="User type")+facet_wrap(~rideable_type)
trip_data_clean %>% group_by(member_casual,rideable_type) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(count))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 5 × 4
## # Groups: member_casual [2]
## member_casual rideable_type count `%`
## <chr> <chr> <int> <dbl>
## 1 member classic_bike 1989279 42.9
## 2 casual classic_bike 1252558 27.0
## 3 member electric_bike 607653 13.1
## 4 casual electric_bike 488183 10.5
## 5 casual docked_bike 303515 6.54
Findings:
Classic bikes are the most popular bike type for both members and the casual.
Docked bikes are the least popular ones and interestingly, docked bikes are only taken by the casual. More data regarding different pricing plans for each type of bike and financial aspect of the analysis are needed to find out why.
ggplot(data = trip_data_clean)+geom_bar(position = "dodge",mapping = aes(x=factor(weekday, level=c('Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday')),fill=member_casual))+labs(title=("How so riders behave differently each of the week?"),subtitle ="Casual vs Member from 2021-04 to 2022-03", x="User type", y="Number of rides",caption = "1e+06=1,000,000, Data provided by Motivate International Inc.")+scale_fill_discrete(name="User type")
trip_data_clean %>% group_by(member_casual,weekday) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(count))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 4
## # Groups: member_casual [2]
## member_casual weekday count `%`
## <chr> <chr> <int> <dbl>
## 1 casual Saturday 458032 9.87
## 2 member Wednesday 411738 8.87
## 3 member Tuesday 403205 8.69
## 4 casual Sunday 402105 8.66
## 5 member Thursday 388455 8.37
## 6 member Friday 366695 7.90
## 7 member Monday 360579 7.77
## 8 member Saturday 350489 7.55
## 9 member Sunday 315771 6.80
## 10 casual Friday 288223 6.21
## 11 casual Monday 231686 4.99
## 12 casual Thursday 228205 4.92
## 13 casual Wednesday 222069 4.78
## 14 casual Tuesday 213936 4.61
Findings:
Members mostly bike during week days and have its peak on Wednesday and hit its lows on weekends.
The casual bike mostly on weekends and you start to notice increase in usage from Friday. They bikes much less and number of rides stay relatively low and consistent on the other week days (Monday, Tuesday, Wednesday, Thursday).
On week days, members bike more than the casual while on weekends, the casual bike more than members.
ggplot(data = trip_data_clean)+geom_bar(mapping = aes(x=hour,fill=member_casual))+labs(title=("How do riders behave differently across all hours of each day of the week?"),subtitle ="Casual vs Member from 2021-04 to 2022-03", x="Hours of the day", y="Number of rides",caption = "Data provided by Motivate International Inc.")+scale_fill_discrete(name="User type")+facet_wrap(factor(weekday,levels = c('Monday','Tuesday','Wednesday','Thursday', 'Friday','Saturday','Sunday'))~.)+theme(axis.text.x = element_text(angle = 90,size = 7,))
trip_data_clean %>% group_by(member_casual,hour) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(hour))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 48 × 4
## # Groups: member_casual [2]
## member_casual hour count `%`
## <chr> <chr> <int> <dbl>
## 1 casual 23 58820 1.27
## 2 member 23 40777 0.879
## 3 casual 22 76629 1.65
## 4 member 22 60496 1.30
## 5 casual 21 82936 1.79
## 6 member 21 80117 1.73
## 7 casual 20 97924 2.11
## 8 member 20 108920 2.35
## 9 casual 19 135346 2.92
## 10 member 19 164261 3.54
## # … with 38 more rows
Findings:
On week days, both members and the casual have very similar behavior pattern, where number of rides reaches its first peak of the day at 8am, then goes down a bit from 9am to 10am, then starts rising from 11am till its new peak of the day around 5pm.
On weekends, number of riders don’t reach its normal week day level till 9 or 10, then keeps rising and stays at relatively high level till 18pm or 19pm.
ggplot(data = trip_data_clean)+geom_bar(position="dodge",mapping = aes(x=quarter,fill=member_casual))+labs(title=("How do riders behave differently across quarters?"),subtitle ="Casual vs Member", x="User type", y="Number of rides",legend="User type")+scale_fill_discrete(name="User type")
trip_data_clean %>% group_by(member_casual,quarter) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(count))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 8 × 4
## # Groups: member_casual [2]
## member_casual quarter count `%`
## <chr> <chr> <int> <dbl>
## 1 casual Q3 1003784 21.6
## 2 member Q3 983989 21.2
## 3 member Q2 716515 15.4
## 4 casual Q2 641425 13.8
## 5 member Q4 606053 13.1
## 6 casual Q4 304149 6.55
## 7 member Q1 290375 6.26
## 8 casual Q1 94898 2.04
Findings:
Rides peaked in Q3 and valleyed in Q1 for both members and the casual.
Interesting to see, only in Q3, more rides were taken by the casual than members.
ggplot(data = trip_data_clean)+geom_bar(position="dodge",mapping = aes(x=factor(month,levels = (c('January','February','March','April','May','June','July','August','September','October','November','December'))),fill=member_casual))+labs(title=("How do riders behave differently across all months of the year?"),subtitle ="Casual vs Member", x="User type", y="Number of rides",legend="User type")+scale_fill_discrete(name="User type")+theme(axis.text.x = element_text(angle = 45))
Findings:
Throughout the year, again, number of rides taken either by members or the casual moved in the same direction.
Number of rides by members peaked in August while number of rides by the casual peaked a little earlier in July.
First, let’s get the map of Chicago. To decide the ‘bbox’ values, I go to openstreetmap
library(ggplot2)
library(maps)
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
install.packages("ggmap")
## Installing ggmap [3.0.0] ...
## OK [linked cache]
library(ggmap)
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
map_chicago<-get_stamenmap(bbox = c(left=-88.3, bottom=41.3, right=-87.0, top=42.4), maptype="terrain",zoom = 11)
## 81 tiles needed, this may take a while (try a smaller zoom).
## Source : http://tile.stamen.com/terrain/11/521/757.png
## Source : http://tile.stamen.com/terrain/11/522/757.png
## Source : http://tile.stamen.com/terrain/11/523/757.png
## Source : http://tile.stamen.com/terrain/11/524/757.png
## Source : http://tile.stamen.com/terrain/11/525/757.png
## Source : http://tile.stamen.com/terrain/11/526/757.png
## Source : http://tile.stamen.com/terrain/11/527/757.png
## Source : http://tile.stamen.com/terrain/11/528/757.png
## Source : http://tile.stamen.com/terrain/11/529/757.png
## Source : http://tile.stamen.com/terrain/11/521/758.png
## Source : http://tile.stamen.com/terrain/11/522/758.png
## Source : http://tile.stamen.com/terrain/11/523/758.png
## Source : http://tile.stamen.com/terrain/11/524/758.png
## Source : http://tile.stamen.com/terrain/11/525/758.png
## Source : http://tile.stamen.com/terrain/11/526/758.png
## Source : http://tile.stamen.com/terrain/11/527/758.png
## Source : http://tile.stamen.com/terrain/11/528/758.png
## Source : http://tile.stamen.com/terrain/11/529/758.png
## Source : http://tile.stamen.com/terrain/11/521/759.png
## Source : http://tile.stamen.com/terrain/11/522/759.png
## Source : http://tile.stamen.com/terrain/11/523/759.png
## Source : http://tile.stamen.com/terrain/11/524/759.png
## Source : http://tile.stamen.com/terrain/11/525/759.png
## Source : http://tile.stamen.com/terrain/11/526/759.png
## Source : http://tile.stamen.com/terrain/11/527/759.png
## Source : http://tile.stamen.com/terrain/11/528/759.png
## Source : http://tile.stamen.com/terrain/11/529/759.png
## Source : http://tile.stamen.com/terrain/11/521/760.png
## Source : http://tile.stamen.com/terrain/11/522/760.png
## Source : http://tile.stamen.com/terrain/11/523/760.png
## Source : http://tile.stamen.com/terrain/11/524/760.png
## Source : http://tile.stamen.com/terrain/11/525/760.png
## Source : http://tile.stamen.com/terrain/11/526/760.png
## Source : http://tile.stamen.com/terrain/11/527/760.png
## Source : http://tile.stamen.com/terrain/11/528/760.png
## Source : http://tile.stamen.com/terrain/11/529/760.png
## Source : http://tile.stamen.com/terrain/11/521/761.png
## Source : http://tile.stamen.com/terrain/11/522/761.png
## Source : http://tile.stamen.com/terrain/11/523/761.png
## Source : http://tile.stamen.com/terrain/11/524/761.png
## Source : http://tile.stamen.com/terrain/11/525/761.png
## Source : http://tile.stamen.com/terrain/11/526/761.png
## Source : http://tile.stamen.com/terrain/11/527/761.png
## Source : http://tile.stamen.com/terrain/11/528/761.png
## Source : http://tile.stamen.com/terrain/11/529/761.png
## Source : http://tile.stamen.com/terrain/11/521/762.png
## Source : http://tile.stamen.com/terrain/11/522/762.png
## Source : http://tile.stamen.com/terrain/11/523/762.png
## Source : http://tile.stamen.com/terrain/11/524/762.png
## Source : http://tile.stamen.com/terrain/11/525/762.png
## Source : http://tile.stamen.com/terrain/11/526/762.png
## Source : http://tile.stamen.com/terrain/11/527/762.png
## Source : http://tile.stamen.com/terrain/11/528/762.png
## Source : http://tile.stamen.com/terrain/11/529/762.png
## Source : http://tile.stamen.com/terrain/11/521/763.png
## Source : http://tile.stamen.com/terrain/11/522/763.png
## Source : http://tile.stamen.com/terrain/11/523/763.png
## Source : http://tile.stamen.com/terrain/11/524/763.png
## Source : http://tile.stamen.com/terrain/11/525/763.png
## Source : http://tile.stamen.com/terrain/11/526/763.png
## Source : http://tile.stamen.com/terrain/11/527/763.png
## Source : http://tile.stamen.com/terrain/11/528/763.png
## Source : http://tile.stamen.com/terrain/11/529/763.png
## Source : http://tile.stamen.com/terrain/11/521/764.png
## Source : http://tile.stamen.com/terrain/11/522/764.png
## Source : http://tile.stamen.com/terrain/11/523/764.png
## Source : http://tile.stamen.com/terrain/11/524/764.png
## Source : http://tile.stamen.com/terrain/11/525/764.png
## Source : http://tile.stamen.com/terrain/11/526/764.png
## Source : http://tile.stamen.com/terrain/11/527/764.png
## Source : http://tile.stamen.com/terrain/11/528/764.png
## Source : http://tile.stamen.com/terrain/11/529/764.png
## Source : http://tile.stamen.com/terrain/11/521/765.png
## Source : http://tile.stamen.com/terrain/11/522/765.png
## Source : http://tile.stamen.com/terrain/11/523/765.png
## Source : http://tile.stamen.com/terrain/11/524/765.png
## Source : http://tile.stamen.com/terrain/11/525/765.png
## Source : http://tile.stamen.com/terrain/11/526/765.png
## Source : http://tile.stamen.com/terrain/11/527/765.png
## Source : http://tile.stamen.com/terrain/11/528/765.png
## Source : http://tile.stamen.com/terrain/11/529/765.png
ggmap(map_chicago)
Second, plot starting points and ending points on the map of Chicago
ggmap(map_chicago)+geom_jitter(trip_data_clean,mapping = aes(x=start_lng,y=start_lat),color="yellow")+facet_wrap(~member_casual)+labs(title = "Where do riders start their rides?",subtitle = "Casual vs Member from 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))
## Warning: Removed 1 rows containing missing values (geom_point).
ggmap(map_chicago)+geom_jitter(trip_data_clean,mapping = aes(x=end_lng,y=end_lat),color="red")+facet_wrap(~member_casual)+labs(title = "Where do riders go with their rides?",subtitle = "Casual vs Member from 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))
Findings:
ggmap(map_chicago)+geom_jitter(filter(trip_data_clean,member_casual=="member"),mapping = aes(x=end_lng,y=end_lat),color="red")+facet_grid(~rideable_type)+labs(title = "Where do members go with their rides?",subtitle = "From 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))
ggmap(map_chicago)+geom_jitter(filter(trip_data_clean,member_casual=="casual"),mapping = aes(x=end_lng,y=end_lat),color="red")+facet_grid(~rideable_type)+labs(title = "Where do the casual go with their rides?",subtitle = "From 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))
ggmap(map_chicago)+geom_jitter(filter(trip_data_clean,member_casual=="member"),mapping = aes(x=start_lng,y=start_lat),color="red")+facet_grid(~rideable_type)+labs(title = "Where do members start their rides with differnet bikes?",subtitle = "From 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))
ggmap(map_chicago)+geom_jitter(filter(trip_data_clean,member_casual=="casual"),mapping = aes(x=start_lng,y=start_lat),color="red")+facet_grid(~rideable_type)+labs(title = "Where do the casual start their rides with different bikes?",subtitle = "From 2021-04 to 2022-03", x="Lng", y=" Lat")+theme(axis.text.x = element_text(angle = 90))
## Warning: Removed 1 rows containing missing values (geom_point).
trip_data_clean %>% group_by(end_station_name,member_casual,rideable_type) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(count))
## `summarise()` has grouped output by 'end_station_name', 'member_casual'. You
## can override using the `.groups` argument.
## # A tibble: 3,746 × 5
## # Groups: end_station_name, member_casual [1,691]
## end_station_name member_casual rideable_type count `%`
## <chr> <chr> <chr> <int> <dbl>
## 1 Streeter Dr & Grand Ave casual classic_bike 38496 0.829
## 2 Streeter Dr & Grand Ave casual docked_bike 20226 0.436
## 3 Clark St & Elm St member classic_bike 18720 0.403
## 4 Michigan Ave & Oak St casual classic_bike 18090 0.390
## 5 Millennium Park casual classic_bike 18046 0.389
## 6 Wells St & Concord Ln member classic_bike 17988 0.388
## 7 Kingsbury St & Kinzie St member classic_bike 17894 0.386
## 8 Wells St & Elm St member classic_bike 15885 0.342
## 9 Broadway & Barry Ave member classic_bike 14364 0.309
## 10 Theater on the Lake casual classic_bike 14222 0.306
## # … with 3,736 more rows
Findings:
From the data provided, we can clearly see that more rides were taken by members (56%) than the casual (44%). But we don’t know how many distinct members/the casual have taken the rides. Unique user ids linked to rides should be requested to be able to find out how often members and the casual bike respectively.
From the table summarized above, the casual, on average, spent 1923 secs (32mins) / ride biking while members spent 777 secs (12mins) / ride.. It verifies the hypothesis made earlier that the casual are more likely to ride for leisure while members are more likely to ride to commute for efficiency.
On maps plotted with starting and ending points of rides, very identical shapes are presented for both members and the casual except a few outliers. In other words, riders, regardless of user type, bike on identical routes, except that the casual,occasionally, bike further away from the city center along the coast. The outliers also verifies that the casual are more likely to bike for leisure.
Members tend to bike more on week days to commute to work from 7 to 8 am and commute back from work from 5-6 pm; and rides by members are less prone to change due to season / weather change because of its nature as necessary vehicles to commute. On the other hand, the casual tend to bike for leisure mostly on weekends and Friday and are more prone to change due to season/weather change. For both members and the casual, the majority of rides happened in the afternoon before evening.
Almost 70% of all rides taken are classic bikes, 43% by members and 27% by the casual respectively. 7% of rides taken are docked bikes and are all by the casual. The rest of 23% are electric bikes, with 13% by members and 10% by the casual. Combined with additional data regarding different pricing plans for each type of bike and financial data should help refine Cyclistic’s offerings to convert the casual to annual members.
Based on the analysis above, members probably bike for leisure and to commute, while the casual mainly bike for leisure.
trip_data_clean %>% filter(member_casual=="casual") %>% group_by(end_station_name,member_casual,rideable_type) %>% summarise(count=length(ride_id),"%"=length(ride_id)/nrow(trip_data_clean)*100) %>% arrange(desc(count))
## `summarise()` has grouped output by 'end_station_name', 'member_casual'. You
## can override using the `.groups` argument.
## # A tibble: 2,224 × 5
## # Groups: end_station_name, member_casual [852]
## end_station_name member_casual rideable_type count `%`
## <chr> <chr> <chr> <int> <dbl>
## 1 Streeter Dr & Grand Ave casual classic_bike 38496 0.829
## 2 Streeter Dr & Grand Ave casual docked_bike 20226 0.436
## 3 Michigan Ave & Oak St casual classic_bike 18090 0.390
## 4 Millennium Park casual classic_bike 18046 0.389
## 5 Theater on the Lake casual classic_bike 14222 0.306
## 6 Wells St & Concord Ln casual classic_bike 12400 0.267
## 7 DuSable Lake Shore Dr & North Blvd casual classic_bike 12348 0.266
## 8 Shedd Aquarium casual classic_bike 11393 0.245
## 9 Clark St & Lincoln Ave casual classic_bike 10948 0.236
## 10 Lake Shore Dr & North Blvd casual classic_bike 10785 0.232
## # … with 2,214 more rows